TCGA - Assembler : Pipeline for TCGA Data Downloading , Assembling , and Processing ( Supplementary Methods )

نویسندگان

  • Yitan Zhu
  • Peng Qiu
  • Yuan Ji
چکیده

The Cancer Genome Atlas (TCGA) is supported by the National Cancer Institute and the National Human Genome Research Institute to chart the molecular landscape of tumor samples for more than 20 types of cancer [1-3]. TCGA has been generating multi-modal genomics, epigenomics, and proteomics data for thousands of cancer patients, providing unprecedented opportunities for researchers to systematically study cancer mechanisms at molecular and regulatory layers. There are different levels of data access for TCGA. While the access to most of the level-1 and -2 data is restricted, the entire level-3 TCGA data as well as some de-identified patient clinical information (e.g., survival and drug treatments) are publicly available. Level-3 TCGA data are normalized measurements of genomics and epigenomics features, e.g., DNA copy number, DNA methylation, mRNA expression, miRNA expression, and protein expression. Such a wealth of information enables biologists, clinicians, and quantitative geneticists to address various important research questions. For example, the level-3 data can be used to infer the effects of multiple types of transcriptional regulators, such as DNA methylation and transcription factors, on gene expression. This type of investigation requires matched data from multiple samples measured by all the relevant biological assays. We summarize the public TCGA data and report in Supplementary Table 1 the number of available samples measured by different assay platforms, and the numbers of patients with clinical information for the different cancer types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TCGA-Assembler 2: Software Pipeline for Retrieval and Processing of TCGA/CPTAC Data.

Motivation The Cancer Genome Atlas (TCGA) program has produced huge amounts of cancer genomics data providing unprecedented opportunities for research. In 2014, we developed TCGA-Assembler (Zhu et al., 2014), a software pipeline for retrieval and processing of public TCGA data. In 2016, TCGA data were transferred from the TCGA data portal to the Genomic Data Commons (GDC), which is supported by...

متن کامل

Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results

MOTIVATION The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis to...

متن کامل

TCGA2STAT: simple TCGA data access for integrated statistical analysis in R

MOTIVATION Massive amounts of high-throughput genomics data profiled from tumor samples were made publicly available by the Cancer Genome Atlas (TCGA). RESULTS We have developed an open source software package, TCGA2STAT, to obtain the TCGA data, wrangle it, and pre-process it into a format ready for multivariate and integrated statistical analysis in the R environment. In a user-friendly for...

متن کامل

TCGA Expedition: A Data Acquisition and Management System for TCGA Data

BACKGROUND The Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteom...

متن کامل

TopFed: TCGA tailored federated query processing and linking to LOD

BACKGROUD The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014